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Abstract 

Designs of experiments for multivariate case are reviewed. Fast algorithm 
of construction of good Latin hypercube designs is developed. 



1 Introduction 

The mathematical theory for designing experiments was started to develop by Sir 
Ronald A. Fisher who pioneered the design principles in his studies of analysis of 
variance originally in agriculture. The theory of experimental designs have received 
considerable development further in the middle of the twentieth century in works 
by G.E.P Box, J. Kiefcr and many others. Computer experiments have become 
available with the appearance of computer engineering. Mathematical computer 
models are a replacement for natural (physical, chemical, biological) experiments 
which are too time consuming or too costly. Moreover, mathematical models may 
describe phenomena which can not be reproduced, for example, weather modeling. 

Experimental designs for deterministic computer models was studied first by 
McKay et al. (1979). The theoretical principles of analysis of deterministic com- 
puter models were determined in Sacks et al. (1989) and the analysis of simulation 
models (deterministic computer codes with stochastic output) in Kleijnen (1987). 
During the last decade the Bayesian approach to computer experiments was ex- 
tensively developed, see Kennedy, O'Hagan (2001), Conti, O'Hagan (2008) and 
references within. The technique used in the Bayesian approach is close to Krig- 
ing in a manner that a special construction is used to interpolate the values of 
the output of the deterministic code rather than the values of a random field and 
uncertainty intervals for untried values of inputs are calculated; see Koehler, Owen 
(1996), Kennedy, O'Hagan (2001). One run by a computer model may require con- 
siderable time. Thus the main problem is to reduce an uncertainty of inferences 
on a computer model by making only a few runs. Consequently, we are faced with 
the problem of optimal choice of experimental conditions. 

The present paper is organized as follows. In Section 2 we review experimental 
designs for a multivariate case in order to choose the most appropriate criteria of 
optimality. In Section 3 we propose a fast algorithm for constructing good optimal 
designs for computer experiments. 
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2 Comparison of natural and computer experi- 
ments 

Basic features of natural and computer experiments are presented in the following 
two-column style. 
Natural experiments Computer experiments 



The response is observed with errors 
which may be correlated. 

The response is described either by 
known regression function with un- 
known parameters or by multivari- 
ate linear or quadratic model which 
is valid at a design subspace. 
A primary objective is to estimate 
parameters or find conditions which 
maximize response. 
Other aims are identifying variables 
which have a significant effect, etc. 

Optimal design is typically to min- 
imize the (generalized) variance of 
estimated characteristics. 
Optimal designs are, for example, 
factorial, incomplete block, orthog- 
onal, central composite, screening 
and D-optimal designs. 
Note that optimal designs for natural experiments mostly have two or three points 
in projection on each coordinate, e.g. the 2 k ~ p block and orthogonal designs have 
two points in projection, central composite design has three points in projection. 
This fact is a consequence of the multivariate linear or quadratic model which is 
assumed to be valid. Such designs are not suitable for computer experiments since 
we assume that the output may be highly nonlinear in several variables. Due to the 
objectives of computer experiments, optimal design should minimize mean square 
error between the prediction of response at untried inputs and the true output. 
This criterion leads to the optimal design which should fill an entire design space 
uniformly at the initial stage of computer experiments. The examples of space- 
filling design arc Latin hypercube design, sphere packing design, distance based 
design, uniform design, design based on random or pseudo-random sequences, see 
Santner et al. (2003), Fang et al. (2006). The optimal design should be a dense set 
in projection to each coordinate and should be a dense set in entire design space. 
Each of the above space-filling designs has attractive properties and satisfies some 
useful criterion. As far as is known, the best design should optimize a compound 
criterion. 



The output is deterministic. The 
running of a computer code at the 
same inputs gives the same output. 
A computer code is considered to be 
like as a black box. The main as- 
sumption is factor sparsity, that is 
the output depends in nonlinear way 
only on a few number of inputs 1 . 
A primary objective is to fit a 
cheaper unbiased low uncertain pre- 
dictor. 

Other aims are calibration of model 
parameters to physical data, opti- 
mization of output, etc. 
Optimality criteria is the minimiza- 
tion of mean square error over de- 
sign space or the maximization of 
entropy. 

Optimal design is space-filling de- 
sign. Latin hypercube design is rec- 
ommended in many papers. 



1 Without sparsity assumption we need a lot of runs to construct an unbiased low 
uncertain predictor. 
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3 Latin Hypercube Designs 

At first, we need to recall an algorithm for construction of LH designs, which 
was introduced in McKay et al. (1979). The algorithm generates n points in 
dimension d in the following manner. 1) Generate n uniform equidistant points 

(s) (s) 

x\ , . . . , Xn in the range of each input, s = 1, . . . , d. 2) Generate a matrix (pij) 
of size d x n such that each row is a random permutation of numbers 1, . . . ,n 
and these permutations are independent. 3) Each column of the matrix (pij) 

corresponds to a design point, that is (zpij >■■■■> x Pd,j) T is jth point of LHD. 
Without loss of generality, we assume that the range of each input is [0, 1] and 
\ s) eft={0,l/(n-l),2/(n-l),...,l} ; 
By construction, LHD has the best filling of range in projection on each coor- 
dinate. Unfortunately, LHD may have a poor filling of entire hypercube. Several 
criteria of optimality are introduced in order to choose a good LHD in a class of 
all LHD. Maximin criterion is a maximization of minimal distance 

( d 

min \\xi - Xj\L = min I V] \x sA - x s j \ p ) 

i,j=X,...,n — l,...,n \ z — ' / 

\s=l / 

usually used with p = 2 where Xi — (xi t i, . . . , Xd,i) T is ith point of design L. An 
LHD which maximize ^(L) is called by maximin LHD. Audze-Eglais criterion 
introduced in Audze, Eglais (1977) is a sum of forces between charged particles 
and is a minimization of 

n n 



Others criteria of uniformity are star L2-discrepancy, centered -^-discrepancy, 
wrap-around L2-discrepancy which are motivated by quasi-Monte-Carlo methods 
and the Koksma-Hlawka inequality, see Hickernell (1998), Fang et al. (2000). 
Algorithms of optimization are studied in a number of papers, the local search 
algorithm in Grosso et al. (2008), the enhanced stochastic evolutionary algorithm 
in Jin et al. (2005), the simulated annealing algorithm in Morris, Mitchell, (1995) 
the columnwise-pairwise procedure in Ye et al. (2000), the genetic algorithm in 
Liefvendahl, Stocki, (2006) and Bates et al. (2003), the collapsing method in Fang, 
Qin (2003). Cited authors concentrate on the case of low dimensions. 

Basing on an analysis of papers on computer experiments, we can say that the 
size of LHD is approximately equal to the input dimension multiplied by 10, that 
is ?i « 10d Further we propose a fast algorithm of constructing good LHD for the 
case of high dimensions which is not studied, to the best of our knowledge. 

First, we need to study features of random LHD generated by the above al- 
gorithm. Let L — {xi, . . . ,x n } be a LHD. Let be a minimal distance between 
Xi and other points of L; that is = min^; \ \xi — Xj\\2 (further we consider 
euclidian distances). These distances characterize a design L. Let Q a denote an 
a-percentile for sample r\, . . . ,r„. Averaged values of low and upper quartiles, 
Q0.25 and Qo.75) are presented in table [TJ We see that the inter-point distances 
are varied and the quarter of distances are quite small. Also note that distances 
between points is increased as the dimension is increased since n = lOd. 
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Table 1: Low and upper quartiles of distances between points of n-point random 
LHD for different dimensions, n = lOrf. 



d 


2 


3 


4 


5 


6 


7 


Qo.25 


0.108 


0.167 


0.232 


0.305 


0.368 


0.434 


Qo.75 


0.175 


0.270 


0.347 


0.431 


0.502 


0.573 


d 


8 


9 


10 


14 


20 




Qo.25 


0.494 


0.554 


0.610 


0.821 


1.096 




Qo.75 


0.636 


0.699 


0.757 


0.972 


1.249 





For construction of n-point LHD with a given inter-point distance r at dimen- 
sion d, we propose the following heuristic algorithm. 

Algorithm. 

1. Let Lk is a fc-point design at kth step. Let L\ = {x±} where x\ is a random 
point in the middle of lZ d such that its coordinates are unequal to each other. 

2. Compute a boolean matrix B — {Bij} of size d x n upon Lk such that 
bij = 1 ('used') if there exists a point in Lk with ith coordinate which 
equals (j — l)/(n — 1), and ('unused') otherwise. 

3. Generate a random point z = (qi, . . . ,qd) J '/(n — 1) G lZ d such that each 
coordinate is unused; that is bi qi = 0, i — l,...,d except one random 
coordinate which should be taken nearby 0.5. 

4. Create a set C of candidate points in TZ d with unused coordinates which are 
approximate points which are the closest and the furthest point from z lied 
on spheres S r (xj) with centers Xj € Lk and radius r, j = 1, . . . , k. 

5. Find a point x* G C such that x* lies outside of all S r (xj), that is | \x* —x 2 || > 
r, j = l,...,k. If there exist several such points, choose a point which 
minimizes ff{s : y^_] Bj, s = m*}, where m* = min J= i n y^,-_] Bjj and 
B = B{L k \J{x*}). 

6. Add x* to design, that is Lfc+i = Lk {J{x*}. Stop at nth step. 

7. If we could not find x* at step 5, go to step 3. If we could find x* after several 
trials, we should decrease r since it is impossible to find a point which is far 
from Lk at given distance r. 

Let a design obtained by Algorithm be called SLHD. Numerical results show 
that Algorithm is fast and work well for any dimension. It requires 40 seconds to 
compute 100-point SLHD at dimension d = 10 and 60 seconds for 140-point SLHD 
at d = 14 and 200 seconds for 200-point SLHD at d = 20 on PC 2.1GHz. The 
choice of r should be smaller than r* where r* is the minimal distance between 
points of exact maximin LHD. Since r* is unknown, we recommend the running 
Algorithm with different r, say, start with Qo.75 f° r random LHD and increase it 
by small increment. The decreasing of r at step 7 does not mean that SLHD does 
not exist for given r and is a consequence a poor placement of points at previous 
iterations. 

Features of SLHD are presented in Table Values of r* are taken from web- 
site [MtpiT/wwwlspa^fillingdesignsml^ We see that 90% of inter-point distances 
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Table 2: Low and upper quartiles and Qo.i of distances between points of n-point 
SLHD for different dimensions, n = lOd. The value r* of maximin LHD is given. 



d 


2 


3 


4 


5 


6 


7 


Qo.i 


0.217 


0.310 


0.351 


0.393 


0.512 


0.584 


Qo.25 


0.217 


0.312 


0.363 


0.476 


0.535 


0.617 


Qo.75 


0.217 


0.323 


0.409 


0.486 


0.552 


0.626 


r* 


0.223 


0.360 


0.476 


0.589 


0.687 


0.779 


d 


8 


9 


10 


14 


20 




Qo.i 


0.679 


0.763 


0.823 


1.035 


1.268 




Qo.25 


0.694 


0.765 


0.824 


1.037 


1.271 




Qo.75 


0.706 


0.774 


0.836 


1.045 


1.281 




r* 


0.867 


0.950 


1.021 









of SLHD arc higher than the most of distances at random LHD. Thus SLHD has 
a better filling of entire design space. Figure Q] display points of SLHD for d = 2 
and d = 20. We can see a quite uniform filling of square. Further improvement 
of experimental design can be done by applying the local search or the simulated 
annealing algorithm. 
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Figure 1: The points of SLHD for d — 2 with the order of including (left) and two 
coordinates of points of SLHD for d = 20 (right). 



4 Conclusion 

The algorithm of construction of LHD with given inter-point distance is con- 
structed and studied. By the algorithm we can quickly compute LHD such that 
the most of inter-point distances are larger than distances at random LHD. The 
proposed algorithm is more efficient than simply generate many random LHDs 
and choose the best one. 
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